We are migrating the bug tracker to github Issues. This is now the preferred way to report NASM bugs.

Self-registration is disabled due to spam issue (mail gorcunov@gmail.com or hpa@zytor.com to create an account)

Bug 3392774 - Large define lists crash recent NASM
Summary: Large define lists crash recent NASM
Status: CLOSED FIXED
Alias: None
Product: NASM
Classification: Unclassified
Component: Assembler (show other bugs)
Version: 2.16.xx
Hardware: Other x86 Linux
: Medium severe
Assignee: nobody
URL:
Depends on:
Blocks:
 
Reported: 2021-08-08 10:27 PDT by E. C. Masloch
Modified: 2022-11-21 12:10 PST (History)
5 users (show)

Obtained from: Built from git using configure
Generated by: ---
Bug category:
Observed for: ---
Regression: ---
Regression since:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description E. C. Masloch 2021-08-08 10:27:04 PDT
Trying to build symbolic lDebugX using makex or makexd of the current symbolic branch of https://hg.pushbx.org/ecm/ldebug results in NASM being killed by the OS. (On an amd64 Linux Debian 10 server.) I was able to determine that a long define appears to be the cause of this behaviour. Here's a test case that works on all three tested NASM versions (TEST=0) or fails on the recent versions (TEST=10, which is the default):

test$ cat test.asm

%if 0

NASM bug test case

2021 by C. Masloch

Usage of the works is permitted provided that this
instrument is retained with the works, so that any entity
that uses the works is notified of this instrument.

DISCLAIMER: THE WORKS ARE WITHOUT WARRANTY.

%endif

%define PATCH_386_TABLE ""

                ; Instruction if no 386+ CPU
        %macro _no386 0-1+.nolist
%push
%assign %$entry $+CODESECTIONFIXUP
                %1              ; write instruction

%rep ($+CODESECTIONFIXUP) - %$entry
                                ; count size of instruction
 %define PATCH_386_TABLE %[PATCH_386_TABLE],%[%$entry]  ; write a patch for each byte
 %assign %$entry %$entry+1
%endrep
%pop
        %endmacro


        org 0

%define CODESECTIONFIXUP -code_start+0
code_start:
%ifndef TEST
 %assign TEST 10
%endif

        _no386 times 3030 + TEST nop





%xdefine PATCH_386_TABLE PATCH_386_TABLE,5800,5801,5802,5815,5820,6091,6137,6165,6174,6180,6217,6233,6414,7328,7329,7330,7331,7488,7538,7636,8078,8393,8394,8395,8396,8397,8398,8399,8400,8401,8402,8403,8404,8405,8406,8407,8408,9227,9625,9644,9681,9711,9713,9716,9737,9740,9741,9742,9743,9744,9748,9751,9752,9753,9754,9755,9756,9757,9758,9942,10178,10179,10180,10181,10182,10183,10184,10185,10186,10187,10985,10997,11596,11638,11639,11640,11653,11654,11655,11656,11657,11658,11659,11660,11661,11910,11911,11912,11913,11914,11915,11916,11917,11918,11919,11960,13257,13265,13270,13292,13293,13294,13295,13296,13297,13298,13299,13300,13301,13302,13352,13416,13464,13465,13466,13469,13470,13471,13506,13524,13545,13564,13574,13575,13576,13577,13578,13579,13580,13605,13623,13625,13627,13629,13648,13650,13652,13654,13698,13699,13700,13701,13881,13923,13924,13925,13938,13939,13940,13941,13942,13943,13944,13945,13946,14712,14779,14780,14781,14782,14787,14788,14789,14790,14791,14792,14793,14794,14795,14796,14849,14854,15015,15058,15059,15060,15061,15078,15079,15080,15081,15082,15087,15088,15089,15165,15167,15169,15171,15178,15179,15180,15181,15182,15183,15184,15185,15186,15291,15301,15306,15307,15308,15309,15320,15321,15327,15328,15329,15330,15331,15332,15333,15334,15335,15336,15337,15338,15339,15340,15341,15342,15343,15344,15345,15346,15347,15348,15349,15350,15353,15428,15514,15522,15523,15524,15525,15526,15563,16232,16564,16565,16566,16567,16568,16569,16570,16571,16611,16612,16671,16676,16681,16704,16705,16706,16707,16708,16709,16739,16740,17147,17148,17158,17160,17193,17195,17546,17558,17720,17725,17728,17734,17772,17777,17782,17784,17794,17796,17798,17802,17805,17808,17811,17843,17846,17889,17928,17929,17930,17931,17932,17933,17934,17935,17936,17937,17938,18186,18187,18188,18189,18190,18191,18192,18193,18194,18505,18506,18507,18508,18509,18510,18511,18512,18513,18514,18515,18516,18517,18781,19274,19646,19651,19656,19683,19704,19705,19706,19707,19708,19709,19710,19711,19712,19713,19714,20009,20226,20273,20274,20278,20282,20286,20287,20291,20295,20299,20300,20304,20308,20312,20313,20317,20321,21598,21604,21631,22083,22084,22085,22086,22087,22088,22089,22090,22091,22092,22093,22094,22095,22096,22322,22325,22361,22803,22805,22810,22819,22821,22824,22831,22833,22838,22847,22849,22852,22859,22868,22878,22962,23071,23072,23073,23074,23075,23083,23084,23212,23222,23223,23224,23245,23255,23256,23257,23258,23259,23260,23270,23271,23272,23279,23286,23288,23299,23328,23329,23330,23336,23338,23350,23351,23352,23370,23371,23372,23373,23485,23492,23495,23496,23497,23508,23546,23555,23815,23816,23817,23818,23819,29190,29608,30573,30574,30575,30576,30577,30578,32975,32977,32979,32981,32983,32985,32987,33012,33014,33016,33018,33020,33022,33024,33172,33173,33174,33175,33176,33177,33178,33179,33180,33181,33182,33183,33184,33185,33186,33187,33188,33189,33190,33191,33271,33272,33273,33274,33275,33276,33277,33295,33299,33303,33307,33311,33315,33377,33378,33379,33380,33400,33404,33408,33412,33416,33420,33434,34013,34027,34030,34076,34079,34121,34576,34577,34578,34579,34580,34581,34582,34583,34584,34585,34586,34607,34608,34609,34610,34611,34612,34613,34614,34615,34616,34617,45818,46072,46075,46078,46080,46107,46110,46133,46134,46135,46136,46139,46140,46141,46145,46147,47058,47059,47060,47061,47377,47378,47379,47380,47474,47475,47476,47477,48272,48289,48292,48296,48299,48305,48307,48310,48313,48749,48759,48760,48761,48762,48796,50174,50183,50185,50227,50230,50265,50271,50272,50273,50274,50304,50305,50306,52079,52151,52164,52180,52188,52189,52190,52209,52281,52286,52306,52456,52459,52461,52476,52478,52481,52491,52502,52503,52504,52674,52675,52676,52708,52709,52710,52711,52712,52724,52725,52726,52727,52728,52742,52753,52758,52759,52760,52761,52762,52763,52764,52765,52766,52767,52768,52769,52770,52771,52772,52773,52774,52775,52776,52777,52778,52779,52780,52781,52782,52783,52784,52785,52786,52787,52788,52789,52803,52804,52805,52816,52822,52825,52852,52853,52854,52855,52903,52904,52905,52917,52919,52995,52996,52997,52998,52999,53000,53001,53002,53003,53658,53659,54547,54548,54682,54688,54689,54690,54691,54692,54693,54694,54695,54696,54697,54698,54699,54700,54701,54702,54703,54704,54705,54706,54707,54708,54709,54710,54711,54712,54713,54714,54715,54716,54717,56482,56499,56504,56980,56981,56982,56990,56991,56992,56995,57178,57179,57180,57229,57240,57241,57242





%macro count 0-*
%warning %0
%endmacro
count PATCH_386_TABLE
%defstr string PATCH_386_TABLE
%strlen length string
%warning length
test$ oldnasm -v
NASM version 2.14.03rc2 compiled on Aug 31 2019
test$ oldnasm test.asm
test.asm:56: warning: (count:1) 3776 [-w+user]
test.asm:54: ... from macro `count' defined here [-w+user]
test.asm:59: warning: 18442 [-w+user]
test$ nasm -v
NASM version 2.15.03 compiled on Dec 28 2020
test$ nasm test.asm
Killed
test$ newnasm -v
NASM version 2.16rc0 compiled on Aug  8 2021
test$ newnasm test.asm
Killed
test$ oldnasm test.asm -DTEST=0
test.asm:56: warning: (count:1) 3766 [-w+user]
test.asm:54: ... from macro `count' defined here [-w+user]
test.asm:59: warning: 18392 [-w+user]
test$ nasm test.asm -DTEST=0
test.asm:56: warning: 3766 [-w+user]
test.asm:54: ... from macro `count' defined here
test.asm:59: warning: 18392 [-w+user]
test$ newnasm test.asm -DTEST=0
test.asm:56: warning: 3766 [-w+user]
test.asm:54: ... from macro `count' defined here
test.asm:59: warning: 18392 [-w+user]
test$
Comment 1 E. C. Masloch 2022-08-25 09:19:49 PDT
We were able to verify that this crash is caused by the OS's OOM killer. (For some reason the server did not write log messages in the places that my partner expected to find them, which did not help us.)

This is the same test case as initially, also available at https://pushbx.org/ecm/test/20220825/test.asm

Here's the test log. The /usr/bin/time executable is GNU time. Its %M format code lists the maximum amount of KiB reserved to the process.

test/20220825$ cat test.asm                                        
%if 0

NASM bug test case

2021 by C. Masloch

Usage of the works is permitted provided that this
instrument is retained with the works, so that any entity
that uses the works is notified of this instrument.

DISCLAIMER: THE WORKS ARE WITHOUT WARRANTY.

%endif

%define PATCH_386_TABLE ""

                ; Instruction if no 386+ CPU
        %macro _no386 0-1+.nolist
%push
%assign %$entry $+CODESECTIONFIXUP
                %1              ; write instruction

%rep ($+CODESECTIONFIXUP) - %$entry
                                ; count size of instruction
 %define PATCH_386_TABLE %[PATCH_386_TABLE],%[%$entry]  ; write a patch for each byte
 %assign %$entry %$entry+1
%endrep
%pop
        %endmacro


        org 0

%define CODESECTIONFIXUP -code_start+0
code_start:
%ifndef TEST
 %assign TEST 10
%endif

        _no386 times 3030 + TEST nop





%xdefine PATCH_386_TABLE PATCH_386_TABLE,5800,5801,5802,5815,5820,6091,6137,6165,6174,6180,6217,6233,6414,7328,7329,7330,7331,7488,7538,7636,8078,8393,8394,8395,8396,8397,8398,8399,8400,8401,8402,8403,8404,8405,8406,8407,8408,9227,9625,9644,9681,9711,9713,9716,9737,9740,9741,9742,9743,9744,9748,9751,9752,9753,9754,9755,9756,9757,9758,9942,10178,10179,10180,10181,10182,10183,10184,10185,10186,10187,10985,10997,11596,11638,11639,11640,11653,11654,11655,11656,11657,11658,11659,11660,11661,11910,11911,11912,11913,11914,11915,11916,11917,11918,11919,11960,13257,13265,13270,13292,13293,13294,13295,13296,13297,13298,13299,13300,13301,13302,13352,13416,13464,13465,13466,13469,13470,13471,13506,13524,13545,13564,13574,13575,13576,13577,13578,13579,13580,13605,13623,13625,13627,13629,13648,13650,13652,13654,13698,13699,13700,13701,13881,13923,13924,13925,13938,13939,13940,13941,13942,13943,13944,13945,13946,14712,14779,14780,14781,14782,14787,14788,14789,14790,14791,14792,14793,14794,14795,14796,14849,14854,15015,15058,15059,15060,15061,15078,15079,15080,15081,15082,15087,15088,15089,15165,15167,15169,15171,15178,15179,15180,15181,15182,15183,15184,15185,15186,15291,15301,15306,15307,15308,15309,15320,15321,15327,15328,15329,15330,15331,15332,15333,15334,15335,15336,15337,15338,15339,15340,15341,15342,15343,15344,15345,15346,15347,15348,15349,15350,15353,15428,15514,15522,15523,15524,15525,15526,15563,16232,16564,16565,16566,16567,16568,16569,16570,16571,16611,16612,16671,16676,16681,16704,16705,16706,16707,16708,16709,16739,16740,17147,17148,17158,17160,17193,17195,17546,17558,17720,17725,17728,17734,17772,17777,17782,17784,17794,17796,17798,17802,17805,17808,17811,17843,17846,17889,17928,17929,17930,17931,17932,17933,17934,17935,17936,17937,17938,18186,18187,18188,18189,18190,18191,18192,18193,18194,18505,18506,18507,18508,18509,18510,18511,18512,18513,18514,18515,18516,18517,18781,19274,19646,19651,19656,19683,19704,19705,19706,19707,19708,19709,19710,19711,19712,19713,19714,20009,20226,20273,20274,20278,20282,20286,20287,20291,20295,20299,20300,20304,20308,20312,20313,20317,20321,21598,21604,21631,22083,22084,22085,22086,22087,22088,22089,22090,22091,22092,22093,22094,22095,22096,22322,22325,22361,22803,22805,22810,22819,22821,22824,22831,22833,22838,22847,22849,22852,22859,22868,22878,22962,23071,23072,23073,23074,23075,23083,23084,23212,23222,23223,23224,23245,23255,23256,23257,23258,23259,23260,23270,23271,23272,23279,23286,23288,23299,23328,23329,23330,23336,23338,23350,23351,23352,23370,23371,23372,23373,23485,23492,23495,23496,23497,23508,23546,23555,23815,23816,23817,23818,23819,29190,29608,30573,30574,30575,30576,30577,30578,32975,32977,32979,32981,32983,32985,32987,33012,33014,33016,33018,33020,33022,33024,33172,33173,33174,33175,33176,33177,33178,33179,33180,33181,33182,33183,33184,33185,33186,33187,33188,33189,33190,33191,33271,33272,33273,33274,33275,33276,33277,33295,33299,33303,33307,33311,33315,33377,33378,33379,33380,33400,33404,33408,33412,33416,33420,33434,34013,34027,34030,34076,34079,34121,34576,34577,34578,34579,34580,34581,34582,34583,34584,34585,34586,34607,34608,34609,34610,34611,34612,34613,34614,34615,34616,34617,45818,46072,46075,46078,46080,46107,46110,46133,46134,46135,46136,46139,46140,46141,46145,46147,47058,47059,47060,47061,47377,47378,47379,47380,47474,47475,47476,47477,48272,48289,48292,48296,48299,48305,48307,48310,48313,48749,48759,48760,48761,48762,48796,50174,50183,50185,50227,50230,50265,50271,50272,50273,50274,50304,50305,50306,52079,52151,52164,52180,52188,52189,52190,52209,52281,52286,52306,52456,52459,52461,52476,52478,52481,52491,52502,52503,52504,52674,52675,52676,52708,52709,52710,52711,52712,52724,52725,52726,52727,52728,52742,52753,52758,52759,52760,52761,52762,52763,52764,52765,52766,52767,52768,52769,52770,52771,52772,52773,52774,52775,52776,52777,52778,52779,52780,52781,52782,52783,52784,52785,52786,52787,52788,52789,52803,52804,52805,52816,52822,52825,52852,52853,52854,52855,52903,52904,52905,52917,52919,52995,52996,52997,52998,52999,53000,53001,53002,53003,53658,53659,54547,54548,54682,54688,54689,54690,54691,54692,54693,54694,54695,54696,54697,54698,54699,54700,54701,54702,54703,54704,54705,54706,54707,54708,54709,54710,54711,54712,54713,54714,54715,54716,54717,56482,56499,56504,56980,56981,56982,56990,56991,56992,56995,57178,57179,57180,57229,57240,57241,57242





%macro count 0-*
%warning %0
%endmacro
count PATCH_386_TABLE
%defstr string PATCH_386_TABLE
%strlen length string
%warning length
test/20220825$ nasm -v
NASM version 2.16rc0 compiled on Aug 23 2022
test/20220825$ nasm test.asm -o /dev/null -DTEST=1000
Killed
test/20220825$ /usr/bin/time --format="%M\n" nasm test.asm -o /dev/null -DTEST=1000
Command terminated by signal 9
3569956

test/20220825$ ~/proj/nasmtest/nasm -v
NASM version 2.16rc0 compiled on Aug 25 2022
test/20220825$ ~/proj/nasmtest/nasm test.asm -o /dev/null -DTEST=1500
test.asm:56: warning: 5266 [-w+user]
test.asm:54: ... from macro `count' defined here
test.asm:59: warning: 25892 [-w+user]
test/20220825$ /usr/bin/time --format="%M\n" ~/proj/nasmtest/nasm test.asm -o /dev/null -DTEST=1500
test.asm:56: warning: 5266 [-w+user]
test.asm:54: ... from macro `count' defined here
test.asm:59: warning: 25892 [-w+user]
3313900

test/20220825$ /usr/bin/time --format="%M\n" ~/proj/nasmtest/nasm test.asm -o /dev/null -DTEST=1000
test.asm:56: warning: 4766 [-w+user]
test.asm:54: ... from macro `count' defined here
test.asm:59: warning: 23392 [-w+user]
2623884

test/20220825$


The nasm executable is this revision:

https://github.com/netwide-assembler/nasm/commit/3aebb20f123033dcd767f0abc46b18cbefed8091

With the following bugs patched:

https://bugzilla.nasm.us/show_bug.cgi?id=3392732

https://bugzilla.nasm.us/show_bug.cgi?id=3392803


The ~/proj/nasmtest/nasm executable is based on the same revision with these additional bugs patched:

https://bugzilla.nasm.us/show_bug.cgi?id=3392804

https://bugzilla.nasm.us/show_bug.cgi?id=3392805


The diff to patch only *this* bug fixed is as follows:

diff --git a/asm/preproc.c b/asm/preproc.c
index 0ff2b518..fed1cc39 100644
--- a/asm/preproc.c
+++ b/asm/preproc.c
@@ -162,7 +162,38 @@ static bool is_smac_param(enum token_type toktype)
  * is incorrect, as some token types strip parts of the string,
  * e.g. indirect tokens.
  */
+#if 0
 #define INLINE_TEXT (7*sizeof(char *)-sizeof(enum token_type)-sizeof(unsigned int)-1)
+#define TOKENPACKED
+#elif 0
+/*
+ * The minimum size is enough to hold "%00" and ".nolist",
+ * as these are compared directly to the Token.text.a field.
+ * Further, to have Token.text.p.pad be at least one byte,
+ * INLINE_TEXT must be at least sizeof(char *) long which is
+ * equal to 8 for long mode.
+ */
+#define INLINE_TEXT 8
+/*
+ * If the structures aren't specified as packed the compiler
+ * will expand struct Token to 32 bytes regardless it appears.
+ * So to minimise memory usage, pack the structures.
+ */
+#define TOKENPACKED __attribute__((packed))
+#else
+/*
+ * Setting the token structure size to 32 bytes appears to be
+ * sufficient to build the lDebug application, hg 7016dd710698,
+ * with the options -D_SYMBOLIC -D_DUALCODE -D_SYMBOLASMDUALCODE
+ * as well as -D_DEBUG -D_PM (lDDebugX build, with symbolic
+ * option and dual code segments).
+ *
+ * 64 bytes, the prior default for building the assembler for
+ * long mode, resulted in the assembler being OOM killed.
+ */
+#define INLINE_TEXT (32-sizeof(char *)-sizeof(enum token_type)-sizeof(unsigned int)-1)
+#define TOKENPACKED
+#endif
 #define MAX_TEXT (INT_MAX-2)
 
 struct Token {
@@ -171,12 +202,12 @@ struct Token {
     unsigned int len;
     union {
         char a[INLINE_TEXT+1];
-        struct {
+        struct TOKENPACKED {
             char pad[INLINE_TEXT+1 - sizeof(char *)];
             char *ptr;
         } p;
     } text;
-};
+} TOKENPACKED;
 
 /*
  * Note on the storage of both SMacro and MMacros: the hash table
Comment 2 E. C. Masloch 2022-08-25 09:28:31 PDT
It seems like this is the revision which introduced packing text inline into tokens: https://github.com/netwide-assembler/nasm/commit/8571f06061b47471a340e350fdfcd804098637d6

Before this, a token with short text (eg a comma, or a decimal number below "65536") would take up exactly as many bytes as were required to hold the text, as well as the token structure (including a pointer to the text). After this patch, each token would unconditionally take up at least 64 bytes (when the assembler is compiled for amd64 long mode).

Ideally I'd like to create a patch that allows a run time selection of the token size (64 vs 32, as 32 bytes suffice to build lDDebugX symbolic) but that seems more complicated. Is there interest for such a patch? If no, can NASM unconditionally use 32 bytes for the token structure instead of the calculation that led to 64 bytes in long mode?

Or should I patch and build NASM separately for my use case?
Comment 3 E. C. Masloch 2022-08-25 16:46:45 PDT
The inline text tokens don't appear to be the only pessimisation between older and recent NASM. Observe:

ldebug/source$ nasm -v
NASM version 2.16rc0 compiled on Aug 23 2022
ldebug/source$ /usr/bin/time --format="%M KiB" nasm debug.asm -I../../lmacros/ -I../../symsnip/ -I../../scanptab/ -o tmp.bin -D_DEBUG -D_PM -D_SYMBOLIC -D_DUALCODE=1 -D_SYMBOLASMDUALCODE
Command terminated by signal 9
3541284 KiB
ldebug/source$ ~/proj/nasmtest/nasm -v
NASM version 2.16rc0 compiled on Aug 25 2022
ldebug/source$ /usr/bin/time --format="%M KiB" ~/proj/nasmtest/nasm debug.asm -I../../lmacros/ -I../../symsnip/ -I../../scanptab/ -o tmp.bin -D_DEBUG -D_PM -D_SYMBOLIC -D_DUALCODE=1 -D_SYMBOLASMDUALCODE
asmtabs.asm:407: warning: Most assembler table prefix bytes: 1 (ofs 4h) mne BOXCB variant (240h + 0*8 + 7),85,, [-w+user]
expr.asm:2843: warning: word data exceeds bounds [-w+number-overflow]
init.asm:1432: warning: patch_no386_table: 946 (Method 2) [-w+user]
init.asm:1432: warning: 1B=318 repo=46 run=426 byte=996 [-w+user]
init.asm:1437: warning: patch_386_table: 50 (Method 2) [-w+user]
init.asm:1437: warning: 1B=4 repo=11 run=13 byte=59 [-w+user]
2560268 KiB
ldebug/source$ oldnasm -v
NASM version 2.14.03rc2 compiled on Aug 31 2019
ldebug/source$ /usr/bin/time --format="%M KiB" oldnasm debug.asm -I../../lmacros/ -I../../symsnip/ -I../../scanptab/ -o tmp.bin -D_DEBUG -D_PM -D_SYMBOLIC -D_DUALCODE=1 -D_SYMBOLASMDUALCODE 2>&1 | grep -v "warning: word data exceeds bounds"
asmtabs.asm:407: warning: Most assembler table prefix bytes: 1 (ofs 4h) mne BOXCB variant (240h + 0*8 + 7),85,, [-w+user]
init.asm:1432: warning: (writepatchtable:73) patch_no386_table: 946 (Method 2) [-w+user]
init.asm:1432: warning: (writepatchtable:74) 1B=318 repo=46 run=426 byte=996 [-w+user]
init.asm:1437: warning: (writepatchtable:73) patch_386_table: 50 (Method 2) [-w+user]
init.asm:1437: warning: (writepatchtable:74) 1B=4 repo=11 run=13 byte=59 [-w+user]
714240 KiB
ldebug/source$


Unpatched (nasm) gets OOM killed at 3.5 GiB. Patched runs to completion with 2.5 GiB. Older one needs less than 800 MiB. The resulting binary is identical.
Comment 4 H. Peter Anvin 2022-08-26 07:56:58 PDT
Hmmm... this smells like a failure to reclaim storage to me. Basically malloc/free is likely to have the same kind of overhead as inlining the text, but if token heads aren't getting reused, this is a memory leak, and one which generic tools will not be able to see.

So my strong guess is that either there is a missing token delete somewhere, or the token allocator fails to makes a deleted token head available for reuse.
Comment 5 E. C. Masloch 2022-08-26 13:20:31 PDT
> So my strong guess is that either there is a missing token delete somewhere, or the token allocator fails to makes a deleted token head available for reuse.

Wouldn't know about the possibility of leaks. However, the token deletion appears to work as expected.

Anyway, I made a crude patch to two different NASM revisions to compare their use of preprocessor tokens. The newer revision is as described here, based on the commit 3aebb20f123033dcd767f0abc46b18cbefed8091, and patched like this:

diff --git a/asm/preproc.c b/asm/preproc.c
index 7724b12a..203664b5 100644
--- a/asm/preproc.c
+++ b/asm/preproc.c
@@ -1717,6 +1717,19 @@ static Token *tokenize(const char *line)
     return list;
 }
 
+static FILE * logfile = NULL;
+static unsigned long logamount = 0;
+static unsigned long logtotal = 0;
+static unsigned long logmodulo = 0;
+
+static void openlogfile(void);
+static void openlogfile(void) {
+	if (logfile) return;
+	logfile = fopen("nasmtoka.log", "wb");
+	if (!logfile) nasm_panic("unable to open log file");
+	return;
+}
+
 /*
  * Tokens are allocated in blocks to improve speed. Set the blocksize
  * to 0 to use regular nasm_malloc(); this is useful for debugging.
@@ -1733,6 +1746,13 @@ static Token *tokenblocks = NULL;
 static Token *alloc_Token(void)
 {
     Token *t = freeTokens;
+	openlogfile();
+	logamount++;
+	logtotal++;
+	if ((logmodulo++ & 8191) == 0) {
+		fprintf(logfile, "[%12lu] (%12lu) allocate\n", logamount, logtotal);
+		fflush(logfile);
+	}
 
     if (unlikely(!t)) {
         Token *block;
@@ -1770,6 +1790,12 @@ static Token *alloc_Token(void)
 static Token *delete_Token(Token *t)
 {
     Token *next;
+	openlogfile();
+	logamount--;
+	if ((logmodulo++ & 8191) == 0) {
+		fprintf(logfile, "[%12lu] (%12lu) delete\n", logamount, logtotal);
+		fflush(logfile);
+	}
 
     nasm_assert(t && t->type != TOKEN_FREE);
 

Diff also available at https://pushbx.org/ecm/test/20220826/new.diff

The older revision is based on commit 52266ad42490f48b91a70efb5c2f93ea281eeb60 and patched like this:

diff --git a/asm/preproc.c b/asm/preproc.c
index 95ca56fc..26cf3002 100644
--- a/asm/preproc.c
+++ b/asm/preproc.c
@@ -1192,6 +1192,19 @@ static void delete_Blocks(void)
     memset(&blocks, 0, sizeof(blocks));
 }
 
+static FILE * logfile = NULL;
+static unsigned long logamount = 0;
+static unsigned long logtotal = 0;
+static unsigned long logmodulo = 0;
+
+static void openlogfile(void);
+static void openlogfile(void) {
+	if (logfile) return;
+	logfile = fopen("nasmtoka.log", "wb");
+	if (!logfile) nasm_panic(0, "unable to open log file");
+	return;
+}
+
 /*
  *  this function creates a new Token and passes a pointer to it
  *  back to the caller.  It sets the type and text elements, and
@@ -1202,6 +1215,13 @@ static Token *new_Token(Token * next, enum pp_token_type type,
 {
     Token *t;
     int i;
+	openlogfile();
+	logamount++;
+	logtotal++;
+	if ((logmodulo++ & 8191) == 0) {
+		fprintf(logfile, "[%12lu] (%12lu) allocate\n", logamount, logtotal);
+		fflush(logfile);
+	}
 
     if (!freeTokens) {
         freeTokens = (Token *) new_Block(TOKEN_BLOCKSIZE * sizeof(Token));
@@ -1229,6 +1249,12 @@ static Token *new_Token(Token * next, enum pp_token_type type,
 static Token *delete_Token(Token * t)
 {
     Token *next = t->next;
+	openlogfile();
+	logamount--;
+	if ((logmodulo++ & 8191) == 0) {
+		fprintf(logfile, "[%12lu] (%12lu) delete\n", logamount, logtotal);
+		fflush(logfile);
+	}
     nasm_free(t->text);
     t->next = freeTokens;
     freeTokens = t;

Diff also available at https://pushbx.org/ecm/test/20220826/old.diff

The command run with either executable is as follows:

nasm debug.asm -I../../lmacros/ -I../../symsnip/ -I../../scanptab/ -o tmp.bin -D_DEBUG -D_PM -D_SYMBOLIC -D_DUALCODE -D_SYMBOLASMDUALCODE

Again, the older revision needs 714356 KiB, the newer one needs 2560204 KiB.

These are the ends of the resulting log files:

ldebug/source$ tail nasmtoka.old
[     9739325] (   378949151) delete
[     9738057] (   378952613) allocate
[     9738415] (   378956888) allocate
[     9738479] (   378961016) delete
[     9738607] (   378965176) delete
[     9738789] (   378969363) allocate
[     9739287] (   378973708) delete
[     9739839] (   378978080) delete
[     9740393] (   378982453) allocate
[     9740661] (   378986683) allocate
ldebug/source$ tail nasmtoka.log
[    71368223] (   436961040) allocate
[    71368229] (   436965139) allocate
[    71368351] (   436969296) delete
[    71368459] (   436973446) delete
[    71369195] (   436977910) delete
[    71369851] (   436982334) allocate
[    71370409] (   436986709) delete
[    71370513] (   436990857) allocate
[    71363959] (   436991676) delete
[    71355767] (   436991676) delete
ldebug/source$

The square bracketed numbers show the currently-allocated amount of tokens, the round parenthetical numbers show the total-allocated amount of tokens (counting only allocation, not deletion).

So it does appear that the newer revision leaks more than the older. I'm not sure whether this explains the dramatic memory use increase however.
Comment 6 E. C. Masloch 2022-08-27 10:54:46 PDT
I used git bisect on the NASM repo, running the following scriptlet to build and test NASM:

$ git clean -x -d -f; touch config/undef.h; ./autogen.sh; ./configure; make; git checkout autoconf; /usr/bin/time --format="%M KiB" ./nasm ~/wwwecm/test/20220825/test.asm

This test resulted in less than 7 MiB of memory use for good revisions, more than 2.8 GiB for bad revisions. I started with https://github.com/netwide-assembler/nasm/commit/52266ad42490f48b91a70efb5c2f93ea281eeb60 as the good revision and https://github.com/netwide-assembler/nasm/commit/3aebb20f123033dcd767f0abc46b18cbefed8091 as the bad revision. First bad revision is https://github.com/netwide-assembler/nasm/commit/de7acc3a46cb3da52464d246b814f8bf059a0360

de7acc3a46cb3da52464d246b814f8bf059a0360 is the first bad commit
commit de7acc3a46cb3da52464d246b814f8bf059a0360
Author: H. Peter Anvin (Intel) <hpa@zytor.com>
Date:   Mon Aug 19 17:52:55 2019 -0700

    preproc: defer %00, %? and %?? expansion for nested macros, cleanups

    BR 3392603: When doing nested macro definitions, we need %00, %? and
    %?? expansion to be deferred to actual expansion time, just as the
    other parameters.

    Do major cleanups to the mmacro expansion code.

    Reported-by: Alexandre Audibert <alexandre.audibert@outlook.fr>
    Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>

 asm/preproc.c | 713 ++++++++++++++++++++++++++++++++--------------------------
 1 file changed, 400 insertions(+), 313 deletions(-)
Comment 7 E. C. Masloch 2022-08-27 11:34:07 PDT
Here's the fix. I don't understand the dup_tlist behaviour enough (yet) to avoid using it, but adding free_tlist fixes the memory leak.

diff --git a/asm/preproc.c b/asm/preproc.c
--- a/asm/preproc.c
+++ b/asm/preproc.c
@@ -5333,8 +5333,9 @@ static Token *expand_mmac_params(Token * tlin
e)
             tt = tokenize(tok_text(t));
             tt = expand_mmac_params(tt);
             tt = expand_smacro(tt);
-            /* Why dup_tlist() here? We should own tt... */
+            /* *tail = tt; */
             dup_tlist(tt, &tail);
+            free_tlist(tt);
             text = NULL;
             change = true;
             break;
Comment 8 E. C. Masloch 2022-08-27 11:43:16 PDT
Better patch, recreating the behaviour of dup_tlist then free_tlist without actually duplicating the tokens:

diff --git a/asm/preproc.c b/asm/preproc.c
--- a/asm/preproc.c
+++ b/asm/preproc.c
@@ -5329,12 +5329,17 @@ static Token *expand_mmac_params(Token * tl
ine)
         case TOKEN_INDIRECT:
         {
             Token *tt;
+            Token *teach;

             tt = tokenize(tok_text(t));
             tt = expand_mmac_params(tt);
             tt = expand_smacro(tt);
-            /* Why dup_tlist() here? We should own tt... */
-            dup_tlist(tt, &tail);
+            *tail = tt;
+            list_for_each(teach, tt) {
+               tail = &teach->next;
+            }
+            /* dup_tlist(tt, &tail);
+            free_tlist(tt); */
             text = NULL;
             change = true;
             break;
Comment 9 E. C. Masloch 2022-08-27 11:54:40 PDT
Patch could probably use list_last instead: https://github.com/netwide-assembler/nasm/blob/3aebb20f123033dcd767f0abc46b18cbefed8091/include/nasmlib.h#L282

But that's just optimisation.
Comment 10 H. Peter Anvin 2022-11-21 12:10:19 PST
Fix checked in. Huge thanks for tracking this down!